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ABSTRACT 

Direct detection of gravitational waves by pulsar timing arrays will become feasible over the next few years. 
In the low frequency regime (10~ 7 Hz - 10~ 9 Hz), we expect that a superposition of gravitational waves from 
many sources will manifest itself as an isotropic stochastic gravitational wave background. Currently, a number 
of techniques exist to detect such a signal; however, many detection methods are computationally challenging. 
Here we introduce an approximation to the full likelihood function for a pulsar timing array that results in 
computational savings proportional to the square of the number of pulsars in the array. Through a series of 
simulations we show that the approximate likelihood function reproduces results obtained from the full likeli- 
hood function. We further show, both analytically and through simulations, that, on average, this approximate 
likelihood function gives unbiased parameter estimates for astrophysically realistic stochastic background am- 
plitudes. 



1. INTRODUCTION 



Gravitational waves (GWs) will very likely be d etected 
in the next few years. Pulsar timing arrays (PTAs) (Hobbs 
et al.|2010) as we ll as ground-bas ed interferometers such as 



Advanced LIGO ([Waldman 2 01 1[ ) are expected to make the 
first direct GW detection on a similar time-scale, though they 
are sensitive to different and complementary regions of the 
GW spectrum. Ground-based instruments are most sensi- 
tive around 100 Hz, and the most promising source at those 
frequencies are binaries of compact objects such as neutron 
stars and black holes (up to a few tens of solar masses). Pul- 
sar timing arrays are most sensitive around 10~ 9 Hz, and the 
most promising source at those frequencies are super-massive 
binary black holes (SMBBHs) that coalesce when galaxies 
merge. 

All the SMBBH mergers that have taken place throughout 
the history of our uni verse produce a stoc hastic backgr ound of 
gravitational waves (|Lommen & Backer 2001 ; Jaffe & Backer 
2003) |Wyithe & Loeb||2003[ |Volonteri et al.||2003[ |Enoki 
et al. |2004| |Sesana et al.||2008| |Sesana||2012| ]Mc Williams 



et al.||20 F2|, as well as individual periodic signals that may 



be detectable as above the confusion noise (|Sesana et al. 
2009]j Sesana & Vecchio|2010MRoedig & Sesana|201 l||Ravi 



et al.|2012||Mingarelli et al.|2012), and bursts ( |van Haasteren 



& Levin||2010| |Cordes & Jenet||2012) l. A number of tech 



niques have been implemented to search pulsar timing data for 
the stochastic background {De tweiler 1979; Stinebrin g et al. 

Anholm et al. 



2009 

Haasteren 



1990 



Lommen 2002 ; Jenet et al. 



van Haasteren et al.||2009afbj 
Ten et al.||2011| |CordesT~S 



2005112006 

Yardley et al.||2011 



van 



iannon||2012i Demorest 



~ " — - — < i — - — ~ " * 1 
et al.||2012[), as well as periodic s ignals (J enet et al. 2004; 



Yardle yera"l.|2010[|Corbin & Cornish|2010||Lee et al.|201l[ 
Ellis et al.||20'l2bUBabak & Sesana||2012||Ellis et al.||2012c[ 
Petit eau et al.|2012| l, and bursts ( |Finn & Lom men 20K)][ 

For stochastic background searches, evaluations of the full 
likelihood are computationally challenging. PTAs are cur- 
rently timing up to a few tens of pulsars, with several thousand 
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points each. In addition, the likelihood function depends not 
only on the relatively small number of parameters that charac- 
terize GW stochastic background, but also on several intrinsic 
red and white noise parameters for each pulsar. A number of 
techniques have already been introd uced to reduce the compu- 
tational burden of such searche s ( |van Ha asteren 2 012||Lentati| 
|et al.|2012] |Taylor et al.|2012| , and we will discuss these re- 
sults later in the paper. 

Although the stochastic background produces random 
changes in the times-of-arrival (TOAs) of an individual pul- 
sar, the cross-correlation of its effects on two puls ars only de- 
pends on the angular separation between pulsars (Hellin gs~&| 
|Downs|1983| l. In this paper we introduce an efficient approx- 
imation to the likelihood by using an expansion to first or- 
der in the amplitu de of t he cross-correlation terms introduced 
by Anholm et al. (2009 1. This technique has already used to 
analyze the first International Pulsar Timing Array Mock Data 
Challenge (Elli s et al.|2012a[ ). The approximation affords us 
a computational savings quadratic in the number of pulsars in 
the pulsar timing array, a factor of a one to three orders of 
magnitude, depending on the size of the PTA. 

This paper is organized as follows. In Section [2] we give 
an overview of the timing model, in Section [3] we write the 
likelihood function for the parameters of the stochastic back- 
ground as well as intrinsic noise parameters of the pulsars, and 
introduce the first order approximation in the amplitude of the 
cross-correlations, in Section [4] we show the effectiveness of 
our approximation using simulated gravitational wave back- 
grounds, and that the level of bias introduced by our approx- 
imation is negligible for astrophysically reasonable stochas- 
tic background amplitudes. We conclude in Section [5] with 
a summary of our results, compare our results to other work 
to increase the computational efficiency of stochastic back- 



ground searches (|van Haasteren 2012| Lenta ti et al. 2012 
|Taylor et al. |2012) , and introduce a technique that can be 



Be 

used to search for a combination of continuous wave signals 
and stochastic backgrounds, a possibility suggested by recent 
work (Ravi et al. 2012), which will be the basis for future 
work. 



2. THE TIMING MODEL 
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In pulsar timing we measure the times-of-arrival (TOAs) 
of radio pulses emitted from pulsars. These TOAs contain 
many terms of known functional form (pulsar period, spin- 
down, etc.), radiometer noise, pulse phase jitter, and possi- 
bly re d noise either from ISM effects, intrinsic pulsar spin 
noise (Shannon & Cordes 2010), or a stochastic gravitational 
wave background (GWB). Let the TOAs for a pulsar be given 
by 



1 =t (€true) + "> 



(1) 



where f° bs is the observed TOA, f det is the deterministic mod- 
eled TOA parameterized by timing model parameters £ true , 
and n is the noise in the measurement which we will assume 
to be Gaussian. We will discuss the exact form of the covari- 
ance matrix for the noise n in the next section. Assuming we 
have an estimate of the true timing model parameters, £ est (ei- 
ther from information gained when discovering the pulsar or 
past timing observations), then we can form the post fit resid- 
uals as follows 



5f P» = f° bS -f det (lest) = ^^trueW^estH" 
* det (£true) 



= t de \ti tme )-t del (£ lme + 60 + n 



85£ 
dt det (£ tiue ) 



5£ + n + 0(6t) 



S£=0 



(2) 



ase, 

= M5g + n 7 



5£ + n 



S£=0 



where M is called the design matrix and we have assumed 
that our initial estimate of the model parameters is sufficiently 
close to the true values that we can approximate this as a lin- 
ear system of equations in 5£. In standard pulsar timing anal- 
ysis, it is customary to obtain the best fit 6£ values through 
a weighted least squares minimization of the pre-fit residuals. 
In the most general case we should be performing a general- 
ized least squares fit using a general covariance matrix for the 
noise n; however, in most cases we have no a priori knowledge 
of this covariance matrix and therefore assume that it is just 
diagonal with elements o f, where ay is th e uncertainty of the 
2th TOA. Previous work (Coles et al.|201 1) > has used an itera- 
tive method to estimate the covariance matrix of the residuals 
and apply a generalized least squares fit, however; for this 
work we will only work with residuals that have been created 
using a weighted least squares fit, since that is the standard 
procedure in pulsar timing residual generation. The v alue of 
chi-squared can be written in the following way (see |Hobbs| 
|etaLlPD55) ) 

"(~) 



N 
i=l 



(3) 



Defining W = 1/cr; we can minimize \ 2 

0= %- = W 2 (M8$, + n)M T 
dob, 



M T W 2 n = -M' W Z M6$, 



to obtain our best fit model parameters 



best ' 



(m t w 2 m) 



M'W z n. 



(4) 



(5) 



Here we have made the choice to include the weights, W, 
since TEMP02 does a weighted fit and we want to reproduce 



the fitting procedure as accurately as possible. Finally we ob- 
tain the post fit residuals by substituting the best fit parameters 
into Eq. [2] 



r^ osl = M5^ t + n 
=*> r = Rn, 



(6) 



where r is just shorthand notation for the post-fit residuals and 



R = l-M(M T W 2 M) 'M l W\ 



Tnr2 



(7) 



is a an oblique projection operator that transforms pre-fit to 
post-fit residuals and I is the identity matrix. All of the infor- 
mation about any noise source or stochastic GWB is encoded 
in n, however; we can never measure n directly because we 
must perform the timing model subtraction. Because of this 
we seek to work exclusively in terms of our observable quanti- 
ties, r. It should be noted that in standard pulsar timing anal- 
ysis this process must be iterated. In other words we form 
pre-fit residuals from our initial guess of the parameters, we 
then minimize the chi-squared to get our best estimates of the 
parameters, however this may not be a good fit because we 
have assumed that the pre-fit residuals are linear in the pa- 
rameter offsets. Thus, we then form new parameter estimates 
from the best fit parameter offsets and iterate until the fit con- 
verges where the reduced chi-squared is used as our goodness 
of fit parameter. 

3. THE LIKELIHOOD FUNCTION 

The likelihood function for the timing residuals may be de- 
rived very simply from the likelihood of the underlying pre-fit 
Gaussian random processes. In this section we will derive an 
expression for the likelihood and introduce our approxima- 
tion. We will also show that, in a frequentist sense, the maxi- 
mum of the expectation value of the likelihood function is an 
unbiased estimator of the noise parameters in the low-signal 
regime. 

Since we have assumed that our noise n is Gaussian and 
stationary, for a pulsar timing array with M pulsars we can 
write the probability distribution as the multi-variate Gaussian 



p(n\0)-. 
where where 



v / det(27rS„) 6XP ( 2" S " n ) 



"i 

"2 

n M . 



(8) 



(9) 



is a vector of the noise time-series, « Q (f), for all pulsars, £„ is 
the pre-fit noise covariance matrix and 8 is a set of parameters 
that characterize the noise. However, as we noted above, we 
do not actually measure n, we measure the timing residuals 
r = Rn where 

17? i ... 
R 2 ... 



R = 

.0 ... R M . 
We compute the likelihood for r as follows. Let 

dn 



(10) 



p(r\6)dr = p(n\6)dn^p(r\6) = p(n\6) 



dr 



(11) 
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where | • | represents the determinant. We evaluate the Jaco- 
bian by assuming that R is invertible and writing n = R~'r, 
therefore 



dn 

dr 



= R = 



1 

1ST 



1 



v/det(RR r ) ' 



(12) 



Substituting this result into Eq. 1 1 we obtain 
1 ( 1 



p(r|0) = 



^det(27rRS„R 7 ') 



exp 



r(R- 1 ) r S; 1 R-r 



(13) 

The product RE„R r is just the covariance matrix for the resid- 
uals 



(rr 



R(nn r )R r 



(14) 



so that the likelihood in terms of the timing residual data is 
simply 



P(r\9) = 



1 



Vdet(27rS) 



exp 



1 



r 1 E~ 



(15) 



The inverse of E does not formally exist since we have re- 
moved degrees of freedom by fitting out the timing model. In 
practice, we can make use of a singular value decomposition 
to compute the determinant and pseudoinverse to evaluate the 
likelihood. Viewed in this way, the likelihood function for the 
residuals is simply a change of coordinates where R is a linear 
(but not invertible) map from n — > r = Rn. 

The covariance matrix for the timing residuals is the block 
matrix, 

Pi Sn ...Sim 

$21 Pi ■■■ Sim 

(16) 



-A/ 



where 



Pa — \ r a r ot)i 



Sa/3 : 



(17) 
(18) 



are the auto-covariance and cross-covariance matrices, re- 
spectively, for each set of residuals. It is very important to 
note that we work exclusively in the post-fit variables. As 
above we use the post-fit residuals, r a = R a n a and the post- 
fit auto- and cross-correlation matrices, P a = R a Pa efa R T a and 
Sa/3 = PaS v ^R T p. Henceforth, we will drop any mention of 
pre-fit or post-fit as we will only work with post-fit variables. 

It is worth pointing out that this treatm ent is somewhat dif- 
ferent f rom previous Bayesian analyse s (|van Haastere n et al. 
2009al |van Haasteren & Levin 20101 |van Haasteren et aT 
2011) 1 (VHML). We use a conditional pdf whereas VHML 
used a marginalized pdf. In other words, we fix the best fit pa- 
rameter offsets, <5£ best through our use of the projection matrix 
R, whereas VHML marginalizes over the parameter offsets 6£ 
(See Appendix |A] for more details). 

We would like to use the likelihood to determine the spec- 
tral index, 7 gw , and amplitude, A gw , of the stochastic back- 
ground from our data. The GW parameters are the same for 
all pulsars. In addition, each pulsar will have intrinsic noise 
parameters as well. The intrinsic pulsar timing noise is nor- 
mally parametrized with four parameters: an amplitude A a 
and spectral index j a for a power law red noise process, and 
EFAC and EQUAD parameters, T a and Q a , for white noise 



processes. In general the EFAC parameter is a multiplicative 
factor representing any systematic effects in the uncertainty in 
each TOA based on the c ross correlation of the folded pulse 
profile with a template ( [Taylor et af1[T992| . The EQUAD 
parameter is an extra white noise parameter that is added to 
the TOA error in qu adrature and could repre sent the expected 
pulse phase jitter ( Cordes & Shannon|20 10) and other white 
noise processes that are un-accounted for. Therefore, we write 
our auto-covariance as a sum of a common GWB term and a 
pulsar dependent term 



P a =N a +S a 



(19) 



where N a is the intrinsic noise auto-covariance matrix and S aa 
is the common GWB auto-covariance matrix for pulsar a. It 
is convenient to work in a block matrix notation where 



E = N + S fl + S e = P+S c 



(20) 



where P is a block diagonal matrix with diagonals P a and S c is 
block matrix with off diagonals S a p, and zero block matrices 
on the diagonal. 

We will now quickly show that, in a frequentist sense, the 
maximum of the expectation value of the likelihood func- 
tion is an unbiased estimator of our signal parameters 9 = 
{A gw j Tgw i A Q , 'fa ,J~ a , Q a } . We write the log likelihood func- 
tion as 

1 



ln£ = - 



TrlnE + r^E-'r 



(21) 



where we have used the fact that lndet(A) = Trln(A) for a gen- 
eral matrix, A. To show that the maximum of the expecta- 
tion value of this likelihood function is an unbiased estimator 
of the signal parameters, 9, we wish to show that it is maxi- 
mized, on average, for signal parameters 9 = 9 lme . Taking the 
expectation value we obtain 



(ln£) 



1 



Tr 



lnE + XE" 



(22) 



where X = (rr T ) is the covariance matrix of the data. Defining 
di = d/d9j we obtain 



di(]n£) 



-Tr 



E-^/E-XE-^iEE" 1 



(23) 



Assuming that our noise model is correct, we have X = E and 



E 9jE-9,EE~ 



= 0, 



(24) 



where we have used the fact that Tr(AB) = Tr(BA) for general 
matrices, A and B. Therefore, the maximum of the expectation 
value of the likelihood function is an unbiased estimator of 
our model parameters 9. 

3.1. Likelihood with first order approximation 

In practice the matrix E is quite large and therefore, compu- 
tationally prohibitive to invert. Since many multi-frequency 
residual datasets now have on the order of 10 3 points, for 
many modern PTAs the matrix E will be of order 10 4 x 10 4 . 
We would like to avoid inverting the full covariance matrix 
if at all possible. First let us rewrite the cross-covariance as 
Sc,a/3 = Ca/3S Q/ 3, where S a p is the temporal cross covariance 
between pulsar a and pulsar f3. The coefficients represent the 
spatial correlations and are given by the Hellings and Downs 



4 



coefficients 



3 1-cos£ Q/9 /l-cos£ Q/3 
1 l-cos£ Q £ 1 1 

Onfi, 

4 2 2 2 aP ' 



(25) 



where £ Q) g is the angular separation of pulsars a and /3, and 
(5 Q( g is the Kronecker delta. We denote P = <5 a ^P Q /3 as the 
auto-covariance matrix of pulsar a describing the noise and 
auto-covariance of the GWB. We then use the following no- 
tation to form matrices from indexed quantities: P = {P a p}- 
Now, we perform the expansion of XT 1 in terms of the coeffi- 
cients ( a /3 




i+p-'K^s^})" p- 1 



P^flS^P,, 



(26) 



+ < E C^C^p^s^p-^s^p-^ \+0(C 3 y 

It is also possible to expand the determinant term in a similar 
fashion 

IndetS = TrlnS = Trln(P+ {(^ a pS a p}) 
= Tr[lnP+ln(I+p- 1 {Ca/3S Q0 })] 



:Tr 



lnP+p-^C^S^} 



{ C^MCjUfPa/jS^P^S^vP^ 



(27) 



Here, the order 0(Q term is zero because P is block diagonal 
and {S Q ^} is block traceless and the trace of the product of 
a diagonal matrix and traceless matrix vanishes. If we ignore 
all terms of £ 2 and higher order and return to our original 
notation then we see that 



5T 1 ssp-'-p-Xp-' + OOC 2 ) 
IndetS wTrlnP+C(C 2 ). 



(28) 
(29) 



This derivation may give us the sense that this expansion may 
hold true for all GWB amplitudes; however, this is not true 
as we will now show. Although we have written this approx- 
imation in terms of an expansion in the Hellings and Downs 
coefficients, it is also useful to think of it as an expansion in 
the amplitu de of the GWB . Indee d, that it how it was con- 
ceived of in Anholm et al. (2009). We have not performed 
a true first order expansion however, since the inverse of the 
auto-correlations matrix P" 1 = (N+A 2 A a )~ l contains terms 
of infinite order in the amplitude. We can essentially think of 
the 0(C) terms in Equations 28 and 29 as the corrections to the 



amplitude parameter when we have a spatially correlated sig- 
nal. Thus, we have truncated these correction terms at 0(A 2 OVI ) 
and we would not expect this approximation to hold as A gw 
becomes large with respect to the intrinsic noise in the pulsar 
as we will show in Section|4] With these approximations, it is 



now possible to write the approximate log-likelihood 
\nC = -- [TrlnP+r^p-'r-r^p-'ScP-'r] 



M 



TrlnP a + r T a P~ l r a 



(30) 



Ma 

In the second line we have explicitly written out the sum over 
pulsars and pulsar pairs in order to highlight the fact that we 
only need to invert the individual auto-covariance matrices 
as opposed to the inverse of the full block covariance ma- 
trix, thereby, significantly reducing the computational cost of 
a single likelihood evaluation. Consider a PTA with M pulsars 
with N TOAs each. For a full likelihood evaluation we must 
perform one Cholesky inversion of the full covariance matrix 
which scales like ~ a(MN) 3 and ~ M 2 matrix multiplications 
which scale like ~ (3N^ . However, one evaluation of the first 
order likelihood requires M Cholesky inversions which scale 
like ~ aN 3 and M matrix multiplications which, again, scale 
like ~ (3N 3 . Though benchmarking tests we have found that 
(3 ~ 10a and thus the matrix multiplications will dominate 
both likelihood calls for a reasonable sized PTAs (M < 100) 
resulting in a computation speedup factor of ~ (a/(3)M 2 . 

It is possible to analytically show that the maximum of the 
expectation value of this approximate likelihood is an unbi- 
ased estimator in the same manner as above. First we take the 
expectation value of the log-likelihood 

(ln£> =-^Tr[lnP+XP- I -XP- 1 S e P" 1 ] (31) 
and then take a derivative with respect to a model parameter 



di(\nC) 



1 



-Tr 



p-ap-xp-app" 



+ xp 1 app 1 s c p _1 - xp 1 as c P" 



+XP" 1 S C P" I <9,PP" 



(32) 



Here we will work in the small signal regime where A„ w is 
small compared to the amplitude of the intrinsic noise. As- 
suming that we have modeled the covariance matrix correctly, 
we have X = S. Writing out the explicit amplitude depen- 
dence we assume 



P: 



N+A^A^P" 1 ? 
N+A gw A+Ag W C, 



-A^N-'AN" 1 



(33) 
(34) 



where N, A, and C are the auto-covariance of the noise, the 
auto-covariance of the GWB and the cross-covariance of the 
GWB, respectively. Then, to first order in A 2 V we have 



di(]nC) : 



1 



Tr 



N~ 9/(Ag W A) 



-N- I a(A 2 w A)-a(A 2 w C)N- 1 



(35) 



= 0, 



where the first two terms cancel and the third term is the trace 
of the product of a diagonal matrix and a traceless matrix. 
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Thus, to first order in Ag W , the maximum of the expectation 
value of this approximate likelihood is an unbiased estimator 
of the our signal parameters 9 in the weak signal limit. 

4. SIMULATIONS 

Here we will compare our first order likelihood approxi- 
mation to the full likelihood of VHML and perform mock 
searches of simulated data with and without an injected 
stochastic GWB in order to demonstrate its efficacy. We 
will also perform monte-carlo simulations to test the con- 
sistency of our likelihood function. These simulations are 
solely meant as a proof of principle and do not claim to repro- 
duce all features of real PTA data (irregular sampling, jumps, 
time varying DM corrections, etc.). However, our analysis 
method makes no assumptions about sampling by operating 
in the time domain and takes all timing model parameters 
into account via the projection matrices introduced in Sec- 
tion [2] The application of this method to real NANOGrav 
and IPTA datasets will be the subject of future work. For all 
simulations in the present work we use TEMP02 and and the 
fake, GWbkgrd, general2 and designmatrix plugins 
to generate the residuals and the corresponding design matri- 
ces. All simulated white noise is solely radiometer noise at 
the level of 100 ns unless otherwise noted. 

4.1. Mock searches 

First we will perform a simple test to compare the first or- 
der likelihood of this work and the full likelihood of VHML. 
Here we use a PTA with 10 pulsars observed at a cadence of 
20 TOAs per year for 5 years where we have fixed the EFAC 
parameter to be one (all white noise is encompassed in error 
bars as simulated) and assume that there is no intrinsic red 
noise, resulting in a search over two parameter; the amplitude 
of the stochastic GWB, A, and the power spectral index, 7. 
For both cases a grid search was carried out with 100 points 
in each dimension and A £ (0, 1 x 10~ 14 ) for an injected value 
of A = 1 x 10" 15 and A £ (0,2 x 10" 14 ) for an injected value of 
A = 1 x 10~ 14 , all the while we have 76 [1,7]. The results are 
presented in Figure [T] where the contours denote the one, two 
and three sigma credible regions, the gray contours are from 
the VHML likelihood function and th e bla ck contours are 
from the first order likelihood. In Figure l(a)| we have injected 
a stochastic GWB with A = 1 x 10" 15 and 7 = 13/3. First we 
notice that the injected value (' x' marker) is well within the 
1 -sigma credible regions for both likelihood functions. We 
also see that the confidence contours are nearly identical, with 
the first order likelihood preferring slightly larger amplitudes 
and smaller spectral indices. This simulation indicates that 
the first order likelihood is a very good approximation to the 
full likelihood when our signal is relatively small, showing 
no discernible bias and faithfully reproducing nearly identical 
credible regi ons. 

In Figure 1(b) we have injected a stochastic GWB with 
A = 1 x 10~ 14 and 7 = 13/3. Again, the injected value lies 
within the 1 -sigma credible region, however; now we do no- 
tice a difference between two credible regions from the full 
and first order likelihoods. The first order likelihood is biased 
towards lower amplitudes and lower spectral indices. In fact 
we can almost see where the first order approximation begins 
to break down. Notice that the contours are nearly identical 
for lower amplitudes and deviate more with increasing ampli- 
tude. This behavior is not surprising in that we know that this 
likelihood is only unbiased to first order in the amplitude as 



shown in Section [3] In fact, it is impressive that this approx- 
imation performs this well with only a small bias in the large 
signal limit (even with timing residuals lower than 100 ns in 
many pulsars, the signal-to-noise-level of the data simulated 
here is well above any reasonable estimates for future PTA 
sens itivities.). This bias will be discussed further in Section 
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The simulations used in the work have been quite ideal 
and do not contain any systematic effects such as clock er- 
rors which can manifest as a c orrelated noise source with uni- 
form correlation coefficients (Yar dley et al.|[20TT) , errors in 
solar system ephemerides, which can manifest as dipole sig- 
nals in the resi duals, or new physics such as non-gr polariza- 
tion modes ( |Lee et a l. 2008, Chamberl in & Siemens|[20T2l > 



or massive gravitons ( |Lee et al.||2010[ ) which would change 
the shape of the Hellings and Downs curve. We have, for the 
most part, also assumed that the intrinsic pulsar noise can be 
assumed to be white gaussian noise with no discernible red 
noise. While previous work sugg ests that there will be red 
noise present in many MSPs ( Shan non & Cordes 2010|l, anal- 
yses of the present timing data (|van Haasteren et al. 201 1" 



|Perrodin et~aT|2013| |Ellis et al.||2013b suggest that the data 
is white noise dominated and there is little to no evidence for 
red noise. However further study of the model selection prob- 
lem taking in to account the aforementioned effects is crucial 
to present detection efforts and will be the subject of a future 
paper. 

4.2. The detection problem 

We now turn to the question of detection. In a Bayesian 
analysis we would like to compute the odds that there is a 
GWB present in our data. Not surprisingly, the tool normally 
used to this end is the Odds ratio of Bayes factors. Consider 
two models that we will label M\ and M%, then the Odds ratio 
is defined as 



= B(M u M 2 \r) 



P (M 2 y 



where 



B(M u M 2 \r) = 



JdhpjrlOuMriptfi) 
Jd0 2 p(r\6 2 ,M 2 )p(6 2 ) 



(36) 



(37) 



is the Bayes factor (i.e the ratio of the marginalized likelihood 
functions over parameters 6\ and 9 2 corresponding to models 
M\ and M 2 respectively), r is our data and p(M\ ) and p(M 2 ) 
are the a priori probabilities on models M\ and M 2 respec- 
tively. Note that the Bayes factor is the data dependent part 
of the odds ratio where the a priori probabilities of the models 
is somewhat subjective, and as such, we will only consider 
Bayes factors when discussing detection in the this workF] 
For our purposes, we would like to compare at least three dir- 
ferent models when weighing the odds of a stochastic GWB 
in our data: 

1 . M gw : A power law stochastic GWB with spatial correla- 
tions described by the Hellings and Downs coefficients 
( a /3, amplitude A gw and power spectral index 7 gw , in- 
dividual power law red noise processes for each pul- 
sar with amplitude A a and power spectral index 7 Q and 

3 It is possible to use astrophysical information such as the expected level 
of the stochastic background compared to our noise or the expectation num- 
ber of single sources to construct the a priori probabilities. Here we will 
quantify our ignorance by considering equal a priori probabilities of all tested 
models. 
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Figure 1. Comparison of full likelihood (gray) of jvan Haastere n et al. (2009a) and the first order likelihood (black), (a): 10 pulsars A =1x10 15 , (b): 10 pulsars 
A= 1 x 10~ 14 



white noise for each pulsar characterized by an EFAC 
parameter JF a and EQUAD parameter Q a . 

2. M con : A common red noise process among pulsars (as 
suggested in |Shannon & Cordes] ( |2010| l) with no spa- 
tial correlations and individual intrinsic red and white 
components as in model M gw . 

3- M nu ii: Only intrinsic red and white noise processes with 
no common red or white noise components among pul- 
sars. 

Comparing models M gw and M nu \\ will tell us whether or not 
there is evidence for any common red noise in our data but it 
will not necessarily tell us that this common noise is due to 
the stochastic GWB or some other common red noise source. 
Hence, a large Bayes factor £>(M gw ,M nu n|r) is necessary but 
not sufficient for detection. However, the comparison of mod- 
els M gw and M con can really give us information about the 
nature of the common red noise signal. As the two aforemen- 
tioned models are identical except for the spatial correlations, 
a large Bayes factor S(M gw ,M corr |r) will give us the odds that 
there is a common red noise process described spatial correla- 
tions C, a p- Since these spatial correlations are the signature of 
a stochastic GWB, the condition that this Bayes factor be large 
is both the necessary and sufficient condition for detection. In 
fact, this Bayes factor is closely related to si gnal-to-noise ra- 
tios in prev i ous detection schem es ( Jenet et a l.|2005||A nholm 
|et al.|2009[[Yardley et al.|2011||Chamberlin et al.|2013[ > that 
measure the significance of the cross correlations. 

This first order likelihood approximation has already been 
tested on the open and closed (Ellis et a l.|2012a| l IPTA Mock 
Data Challenge, where all challenges consisted of 130 data 
points per pulsar with 36 pulsars. For the closed data chal- 
lenge, we have com puted the Bayes fac tors mentioned in the 
previous section. In Ellis et al. d2012a) we have shown that 
we do indeed see very strong evidence for both a common red 
noise signal and a red noise signal with spatial correlations 
described by the Hellings and Downs coefficients. However, 
as we mentioned above, although in this case, the evidence 
for both models M gw and M corr is very high, as we expect, the 
Bayes factor £>(M gw ,M nu n) is much larger than S(M gw ,M corr ). 
For this reason, we expect that in analysis of real PTA data 
we will begin to see strong evidence for common red noise 
before we are able to see strong evidence for the expected 



cross correlations. In other words, as we gain more sensitiv- 
ity, the first two terms in Eq. 30 will dominate the likelihood 
function and the third term wiironly play a significant role as 
our sensitivity increases further. A full analysis of this feature 
along with projected sensitivity curves based on future pulsar 
timing campaigns and hardware upgrades will be explored in 
future work. 

closed MDCs. The open datasets, in particular open MDC 1 , 
act as illustrative cases where the first-order approximation 
breaks down and shows a large bias in parameter estimation. 
(Show plot or Rutger's results along with ours). 

4.3. The Empirical Distribution Function 

Here we will test the consistency and unbiasedness of our 
model through injections. Simply put, it is a type of hy- 
pothesis testing similar to the Kolmogorov-Smirnov test. In 
this test the null-hypothesis, our analysis method is internally 
consistent, is accepted when for x% of realizations, the true 
injected parameter lies within the inner x% of the marginal- 
iz ed posterior d istribution. A similar test was done recently 
in |van Haasteren & Levin| ( 2012| l in one dimension through 
the use of the empirical distribution function (EDF). Here 
we will review this method and generalize it to two dimen- 
sional marginalized posterior distributions. We define the 
inner high-probability region (HPR) of the two-dimensional 
marginalized posterior distribution as 



p(9 1 ,6 2 )d9 l d9 2 =a 



(38) 



W = {9 u 9 2 &R:p(9 u 9 2 )>L a }, 



where L a is some value > unique to each a that corresponds 
to a curve of equal probability in the two-dimensional pa- 
rameter space. In practice we lay down a grid in this two- 
dimensional parameter space and perform our search over the 
two parameters of interest (for the stochastic background we 
search over A and 7, the dimensionless strain amplitude and 
power spectral index of the GWB). We then define a set of 
points {Ai, 7,} € S a : /?(A,-,7;) > L a , that is to say we find all 
points in our grid that correspond to posterior values that lie 
inside our contour curve L a . To determine if the injected val- 
ues of {A true , 7tme} lie within the HPR we simply check to see 
if the injected values are consistent with the set S a . To do this 
we first define the complementary set to be S a such that points 



7 



that are in this set are outside or the HPR. Now we define two 
chi-squared functions in the parameter space 

r a \2 ( Ai~ -Atrue\ / 7/ — 7true\ 

XufAh'TiT = —. + ( — ) (39) 



-<*true / \ Ttrue / 

: / Aj-A tme \ 2 /7j-7ti- N ~ 



.true 



+ 



7n 



rue 



, (40) 



where {A,, 7,} and {Aj^j} are elements of the sets S„ and 
S a , respectively. Finally, we define the empirical distribution 
function (EDF) as 



F k (a) ■- 



1 

-^e(min^-minxa), 



(41) 



n=l 



where 9(x) is the Heaviside function. The term inside the 
sum indicates an event when the injected values are "closer" 
(in the chi-squared sense) to one of the elements of S a than to 
any of the elements of S a , therefore we can say that the values 

{At me , 7 tme } 

join the set S a and lie within the HPR defined in 
Eq. [38] Now that we have defined our EDF, the rest of the 



analysis mimics van Haasteren & Levin I 

For this analysis we simulated 1000 datasets for 6 differ- 
ent scenarios. In all cases we chose the white noise level 
to be 100 ns while we chose GWB amplitudes of 1 x 10~ 15 , 
2 x 10" 15 , and 3 x 10" 15 for PTAs with both 10 and 15 pul- 
sars with a 5 year baseline. Figure [2] shows the EDF for the 
six models outlined above. The thick lines denote a 10 pulsar 
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Figure 2. Empirical distribution function for 6 scenarios. The thick lines 
denote a 10 pulsar PTA and the thin lines denote a 15 pulsar PTA and the 
solid, dashed and dotted lines denote injected stochastic GWB amplitudes of 
1 X lfr 15 , 2 X ltr 15 , and 3 X lfr 15 , respectively. The solid lines at ±0.052 
represent the value at which we should reject the null-hypothesis that our 
analysis method is consistent and unbiased. 

PTA and the thin lines denote a 15 pulsar PTA and the solid, 
dashed and dotted lines denote injected stochastic GWB am- 
plitudes of 1 x 10" 15 , 2 x 10" 15 , and 3 x 10" 15 , respectively. 
The solid lines at ±0.052 represent the value at which we 
should reject the null-hypothesis that our analysis method is 
consistent and unbiased. Firstly, we note that for both the 
10 and 15 pulsar PTA, our analysis method is consistent for 
an injected amplitude of A = 1 x 10 15 . We obtain similar re- 
sults in the 10 pulsar case for amplitudes of A = 2 x 10~ 15 
and A = 3 x 10~ 15 . Here we do see that our method is in- 
deed slightly biased for these larger amplitudes but the degree 



of bias is almost negligible. However, for these same am- 
plitudes in the 15 pulsar case there is a significant bias. Even 
though there is a bias present in these scenarios, the EDF does 
not give information about how this bias presents itself in the 
two dimensional parameter space. In Figure [3] we show the 
two-dimensional scatter plot of the maximum likelihood pa- 
rameters from our Monte-Carlo simulations. It is clear that 
the bias in our two-dimensional parameter space of interest is 
practically very small. In fact the means of the distributions 
for A and 7 for the 10 pulsar case are (1.6,2.25,3.14) x 10" 15 
and (4.17,4.24,4.23), respectively and for the 15 pulsar case 
we obtain (1.56,2.29,3.22) x 10" 15 and (4.11,4.12,4.13), re- 
spectively. In the first row of Figure [3] we show the 10 pulsar 
case with increasing GWB amplitude and the second row we 
show the same for the 15 pulsar case. In the cases where 
there is a bias present, the likelihood function prefers slightly 
lower spectral indices and slightly larger amplitudes. How- 
ever, from our experience with the MDC this bias can also 
present itself by preferring a slightly higher spectral index and 
lower amplitude. It should be noted that even the smallest of 
the amplitudes tested here are near the upper range of the ex- 
pected level of the stochastic GWB (Sesana 2012) and that 
the white noise rms of the pulsars is slightly unrealistic in our 
current PTA regime. In fact we expect to have maybe five 
or six pulsars that time at or below the 100 ns level while we 
have many others that have much larger white noise rms. Thus 
we can conclude that even though our likelihood is somewhat 
biased at larger amplitudes (as is expected), for realistic astro- 
physically likely stochastic GWBs this method is effectively 
consistent and unbiased. In fact, in terms of setting upper 
limits on the stochastic GWB amplitude, this method is prac- 
tically identical to using the full likelihood, while much more 
computationally efficient. 

5. DISCUSSION AND CONCLUSIONS 

Here will will briefly discuss future prospects of conducting 
a simultaneous search for continuous GWs and the stochastic 
GWB. We will also compare our work to other recent efforts 
to speed up PTA GW data analysis and discuss the importance 
of our first-order likelihood method. 

5.1. Simultaneous Detection of Continuous GWs and a 
Stochastic GWB 

One very important feature of the first order likelihood 
method is that it can also be applied to searches for con- 
tinuous GWs. This will allow us to simultaneously search 
for a correlated stochastic background and resolve individ- 
ual sources that are bright enough to stand out above such a 
background. In standard continuous GW searches using PTAs 
( |Babak & Sesana|2012[ [Ellis et al.|20T2cl|Petiteau et al.|2012| i 
the assumption is made that any detectable single source will 
be bright enough such that the noise (e.g stochastic GWB) 
can be approximated as a gaussian process that is uncorre- 
lated among pulsars. However, recent work (|Ravi et al.|2012| l 
has shown that we are likely to see a few single sources per 
frequency bin that will stand out from the typical isotropic 
stochastic background, thus in order to resolve the weakest 
of these it is crucial to simultaneously search for a correlated 
stochastic background as well as the continuous source. We 
can then write down a combined likelihood function assuming 
a deterministic source of functional form s(A) 



p(r\e,X) = 



%/det27rS 



exp 



-(r-sfSTV-s) , (42) 
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Figure 3. Here we show the scatter of the maximum likelihood values of the GWB amplitude and spectral index from the Monte-Carlo simulations. From left 
to right the injected amplitudes are 1 X 10~' 5 , 2 X 10~ 15 , and 3 X 1CT 15 with spectral index 13/3 for a 10 pulsar PTA (top row) and 15 pulsar PTA (bottom row). 
We can see that nearly all of these distributions display minimal bias. 

where our noise (including the stochastic background) param- 
eters are 9 and our single source parameters are A. Using our 



first order likelihood approach we can approximate Eq. 42 



In p(r\e,X)=fa -- [Trln P + (r - s) r P _1 (r - s) - (r - s) r P _1 S C P _1 (r - s)] 



1 M 

—It. 



TrlnP a + (r a - s a ) T Pj (r a - s a ) - ^(r Q - s a ) T Pj S a pPf (rp - sp) 

Ma 



(43) 



As in the stochastic background case, this again will speed up 
computations because we only have to invert the individual 
auto-covariance matrices as opposed to the full data covari- 
ance matrix. Although there have been proposed methods to 
speed up the computatio n of the st ochastic likelihood func- 
tion of Eq. 15 (van Haasteren 2012}, this is not applicable to 
continuous sources because it relies on essentially applying a 
low pass filter to the data. However, since we expect continu- 
ous sources across the entire frequency band (with higher fre- 
quency sources possibly standing out above the background) 
we must keep all frequency information. Therefore our first 
order likelihood approximation is a viable option when look- 
ing to significantly speed up computation time while losing 
minimal information about potential GW signals. 



As always, to claim a detection we must do some sort of 
model comparison, be it a Neyman-Pearson test for Frequen- 
tist statistics or an odds ratio or Bayes factor for Bayesian 
statistics. For example if we want to assess the likelihood of 
that a continuous GW is in our data we want to compute the 
following Bayes factor 



■cw 



J J d\d6p(r\6,X)p(X)p(6) 



(44) 



oise / d0p(r\6)p(0) 

where Zcw and Z n0 i se are the evidence for the gravitational 
CW and noise models, respectively. However, notice that 9 
depends on our stochastic GWB parameters as we treat all 
stochastic processes as "noise" in this analysis. If we do not 
include the GWB parameters in the model then we could mis- 
take a low frequency GWB for a single continuous source, 
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thus including the GWB stochastic background in both mod- 
els is crucial to detection and eventually characterization of 
a single GW source. W e sh ould also mention that the bi- 
ases mentioned in section [43] are not as important if we sim- 
ply wish to let the noise parameters vary along with the sin- 
gle source parameters since these noise parameters will be 
marginalized over in the end. An exploration of these com- 
bined searches will be the subject of a future paper. 

5.2. Comparison with Other Work 

Recently there have been three studies devoted to mak- 
ing th e analysis of PTA d ata more computationally efficient. 
First, |van Haasteren| ( |2012| hereafter vH12) have developed 
a method dubbed Acceleration By Compression (ABC) to 
speed up this analysis. The main point of this work is to write 
the data in a compressed basis, keeping the minimum num- 
ber of basis vectors to maximize the ability to characterize 
a correlated red signal. This work also makes use of an in- 
terpolation scheme to compute the covariance matrix which 
further improves the efficiency of the algorithm at the cost 
of large memory usage. This method has proved to be very 
efficient and accurate in setting upper limits on the stochas- 
tic GWB and characterizing injected signals. However, since 
this method relies on a reduced basis that essentially "throws 
away" high frequency information it is impossible to obtain a 
reliable Bayes Factor when comparing models that allow for 
varying white noise components. Since our first-order likeli- 
hood function makes use of all the information in the data we 
can indeed compute reliable Bayes factors and make confi- 
dent statements about detection. We note however that the 
first-order likelihood of this work and the ABC method of 
vH12 are complementary. The two methods can in principle 
be combined for even greater efficiency. 

Most recently there have been two analyses of the IPTA 
MDC that aim to make the PTA data analysis more efficient. 
First, |Lentati et al.| ( |2012| l have developed a novel model- 
independent method for the estimation of the spectral prop- 
erties of an isotropic stochastic GWB. This method uses a 
frequency domain approach and is extremely efficient and re- 
sults in computational speedups of two to three orders of mag- 
nitude over the full likelihood implementation. It has also 
been extensively tested on the MDC datasets and has proved 
to be very accurate in characterizing the stochastic GWB. Our 
first order likelihood method is indeed complementary to this 
work as it provides a way to efficiently evaluate the likelihood 
function in a full time domain analysis which will be vital for 
cross-chec ks for real-life detection candidates. 

Finally, Tayl or et al.| ( |2012| l have implemented the full 
VHML likelihood function and have made it more effi- 



cient through the use of optimized linear algebra libraries 
with multithreading and parallelization resulting in significant 
speedups in the likelihood evaluation. However, all of these 
methods could just as well be applied to the first-order likeli- 
hood which would still be more efficient than the full likeli- 
hood by a factor proportional to the number of pulsars in the 
array. 

This work and recent work have shown that there has indeed 
been significant progress on making the likelihood evaluation 
more efficient for pulsar timing arrays. All of these methods 
are complementary and will provide important cross checks 
for future stochastic GWB detection candidates. 

5.3. Summary 

In this paper we have introduced a novel way to speed up 
the computation of the likelihood function for PTAs when 
searching for a stochastic GWB. This was accomplished by 
expanding the likelihood function to first order in the Hellings 
and Downs correlation coefficients expected for a stochastic 
GWB leading to a computational speedup on the order of the 
square of the number of pulsars in the PTA. For typical PTAs 
this results in a speed-up of a few hundred to about a thousand. 
We have briefly discussed the implementation of this tech- 
nique on the first IPTA Mock Data Challenge and showed that 
this algorithm performs well in extracting the injected GWB 
parameters and making a significant detection through vari- 
ous Bayes factors. Though this is indeed an approximation to 
the full likelihood function we have shown through extensive 
simulations that the bias introduced in the estimation of GWB 
parameters is minimal and negligible in many cases. This 
was accomplished through an analytical computation of the 
expectation value of the maximum likelihood, direct compar- 
isons of the full and first-order likelihood functions on simu- 
lated data sets and through a statistical Monte-Carlo approach 
based on the Empirical Distribution Function. Although this 
work has focused solely on the detection and characterization 
of a stochastic GWB, this likelihood function can also be used 
to estimate the intrinsic red and white noise parameters of in- 
dividual pulsars simultaneously with the GWB parameters. 
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APPENDIX 
RELATIONSHIP TO VHML LIKELIHOOD 

Making use of Eq. [6j the likelihood function for the noise can be written as 

P (n\8) = P (r\8M he J = ^1^^ X exp ^(r-M^ best ) r S^(r-M^ best ) 



(Al) 



This can be thought of as a conditional pdf, where the values of <5£ best are fixed. In van Haasteren & Levin ( 2012[ l it was shown 
that the marginalized likelihood can be written as 



P (t\0)= / d6tp(T\6,8Q: 



exp 



-±r r G r (G r S„G)"'G r r 



(A2) 



Vdet27rG r S„G 
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where G is the matrix constructed from the final (N—N^t) columns of the matrix U in the singular value decomposition of the 
design matrix, M = USV 7 ". 

We will now explore the G matrix and the R matrix obtained from the marginalized and conditional pdfs, respectively. As 
mentioned above, R can be thought of as an oblique projection operator that projects the pre-fit residuals into the post-fit residual 
space, whereas G T can be thought of a projection operator that projects our data onto the null space of M, that is, it projects the 
data into a subspace orthogonal to the timing model fit. Since R is not generally symmetric and therefore is an oblique projection 
operator, it does not have such a simple mathematical interpretation. However, we can recast our problem in terms of "weighted" 
residuals then we have the following transformations: r — > Wr, M — > WM, and R — »■ W~ l RW, where W is the weighting matrix 
defined above. In this case minimizing the chi-squared becomes an unweighted least squares problem and we obtain the exact 
same estimates of <5£ best and likelihood function as before. In this case R is symmetric and can be thought of as an orthogonal 
projection operator that projects our weighted data onto the null space of the weighted timing model (WM). However, in order 
to compute the likelihood we still have to invert the covariance matrix S r = RT, n R T which is singular. To do this we rely on the 
pseudo-inverse. The pseudo-inverse of S r is easiest defined in terms of its eigen-decomposition S r = EDE T , with E the matrix 
of eigenvectors of S r , and D the diagonal matrix with D„ = A, the eigenvalues of S r . It so happens that for a symmetric positive 
semi-definite matrices like these, the eigen-decomposition is also the singular value decomposition (SVD). The pseudo-inverse 
of E r is then 

T^ = ED =I D T , (A3) 

where the overbar indicates that we are taking a pseudo-inverse and D" 1 ,, = 1/A, for A > and D~ l u = otherwise. Note that when 
all the error bars are the same (i.e. W = a~ x I with a constant), the matrix G T T, n G has the same eigenvalues as the non-singular 
part of RT,„R T and we have 

(RE n R T )- 1 = G(G T Y, n Gy x G T . (A4) 



Thus we have obtained a very interesting result that in the case of uniform uncertainties, the conditional pdf making use of a 
pseudo-inverse is equivalent to the marginalized pdf making use of the projection matrix G T . However, in general this is not true 
and the two methods are indeed different. Although, in many cases the uncertainties are similar on a majority of the TOAs, thus 
the two methods will not differ much in practice. 
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